Power Architecture |
---|
Historical |
POWER • POWER1 • POWER2 • POWER3 • POWER4 • POWER5 • PowerPC-AS • PPC6xx • PPC7xxx • PPC970 • Gekko • PA6T • Titan • AIM alliance |
Current |
PowerPC • e200 • e300 • e500 • e600 • e5500 • QorIQ • POWER6 • POWER7 • PPC4xx • PPC7xx • PPC A2 • Cell • Xenon • Broadway |
Future |
Related Links |
RISC • Blue Gene • Power.org • PAPR • PReP • CHRP • AltiVec • more... |
POWER is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC.[1]
POWER is also the name of a series of microprocessors that implement the POWER ISA. The POWER series microprocessors are used as the CPU in many of IBM's servers, minicomputers, workstations, and supercomputers. The POWER3 and subsequent microprocessors in the POWER series all implement the full 64-bit PowerPC architecture. The POWER3 and above do not implement any of the old POWER instructions that were removed from the ISA when the PowerPC ISA came out, nor any of the POWER2 extensions such as lfq
or stfq
.
Appendix E of Book I: PowerPC User Instruction Set Architecture of PowerPC Architecture Book, Version 2.02 describes the differences between the POWER and POWER2 instruction set architectures and the version of the PowerPC instruction set architecture implemented by the POWER5.
Contents |
In 1974, IBM started a project with a design objective of creating a large telephone-switching network with a potential capacity to deal with at least 300 calls per second. It was projected that 20,000 machine instructions would be required to handle each call while maintaining a real-time response, so a processor with a performance of 12 MIPS was deemed necessary. This requirement was extremely ambitious for the time, but it was realised that much of the complexity of contemporary CPUs could be dispensed with, since this machine would need only to perform I/O, branches, add register-register, move data between registers and memory, and would have no need for special instructions to perform heavy arithmetic.
This simple design philosophy, whereby each step of a complex operation is specified explicitly by one machine instruction, and all instructions are required to complete in the same constant time, would later come to be known as RISC.
By 1975 the telephone switch project was canceled without a prototype. From the estimates from simulations produced in the project's first year, however, it looked as if the processor being designed for this project could be a very promising general-purpose processor, so work continued at Thomas J. Watson Research Center building #801, on the 801 project.
For two years at the Watson Research Center, the superscalar limits of the 801 design were explored, such as the feasibility of implementing the design using multiple functional units to improve performance, similar to what had been done in the IBM System/360 Model 91 and the CDC 6600 (although the Model 91 had been based on a CISC design), to determine if a RISC machine could maintain multiple instructions per cycle, or what design changes need to be made to the 801 design to allow for multiple-execution-units.
To increase performance, Cheetah had separate branch, fixed-point, and floating-point execution units. Many changes were made to the 801 design to allow for multiple-execution-units. Cheetah was originally planned to be manufactured using bipolar emitter-coupled logic (ECL) technology, but by 1984 complementary metal–oxide–semiconductor (CMOS) technology afforded an increase in the level of circuit integration while improving transistor-logic performance.
In 1985, research on a second-generation RISC architecture started at the IBM Thomas J. Watson Research Center, producing the "AMERICA architecture"; in 1986, IBM Austin started developing the RS/6000 series, based on that architecture.
Sometime in the years of 1986-89, the Bellatrix project was started, with the premise of using the America architecture as the base for a common architecture that could host OS/390 for mainframe applications, OS/400 for multi-processor server transactional processing, and AIX for scientific applications.
Sometime between the years of 1990-95, the project was considered overly ambitious and was canceled.
In February 1990, the first computers from IBM to incorporate the POWER Architecture ("Performance Optimized With Enhanced RISC") were called the "RISC System/6000" or RS/6000. These RS/6000 computers were divided into two classes, workstations and servers, and hence introduced as the POWERstation and POWERserver. The RS/6000 CPU had 2 configurations, called the "RIOS-1" and "RIOS.9" (or more commonly the "POWER1" CPU). A RIOS-1 configuration had a total of 10 discrete chips - an instruction cache chip, fixed-point chip, floating-point chip, 4 data cache chips, storage control chip, input/output chips, and a clock chip. The lower cost RIOS.9 configuration had 8 discrete chips - an instruction cache chip, fixed-point chip, floating-point chip, 2 data cache chips, storage control chip, input/output chip, and a clock chip.
A single-chip implementation of RIOS, RSC (for "RISC Single Chip"), was developed for lower-end RS/6000's; the first machines using RSC were released in 1992.
In 1990 the Amazon project was started to create a common architecture that would host both AIX and OS/400. The AS/400 engineering team at IBM was designing a RISC instruction set to replace the CISC instruction set of the existing AS/400 computers. Their original design was a variant of the existing "IMPI" instruction set, extended to 64 bits and given some RISC instructions to speed up the more computationally intensive commercial applications that were being put on AS/400s. IBM management wanted them to use PowerPC, but they resisted, arguing that the existing 32/64-bit PowerPC instruction set would not enable a viable transition for OS/400 software and that the existing instruction set required extensions for the commercial applications on the AS/400. Eventually, an extension to the PowerPC instruction set, called "Amazon", was developed.
At the same time, the RS/6000 developers were broadly expanding their product line to include systems which spanned from low-end workstations, to mainframe competitor-large enterprise SMP systems, to clustered RS/6000-SP2 supercomputing systems. PowerPC processors developed in the AIM alliance suited the low-end RISC workstation and small server space well. But mainframe and large clustered supercomputing systems required more performance and RAS features than processors designed for Apple Power Macs. Multiple processor designs were required to simultaneously meet the requirements of the cost-focused Apple Power Mac, high-performance and RAS RS/6000 systems, and the AS/400 transition to PowerPC.
Amazon was extended to support those features as well, so that processors could be designed for use in both high-end RS/6000 and AS/400 machines.
The project to develop the first such processor was "Bellatrix" (the name of a star in the Orion constellation, also called the "Amazon Star"). The Bellatrix project was extremely ambitious in its pervasive use of self-timed & pulse based circuits and the EDA tools required to support this design strategy, and was eventually terminated. To address technical workstation, supercomputer, and engineering/scientific markets, IBM Austin (the home of the RS/6000s) then started developing a time-to-market single-chip version of the Power2 (P2SC) in parallel with the development of a sophisticated 64-bit PowerPC processor with the POWER2 extensions and twin sophisticated MAF floating point units (the POWER3/630). To address RS/6000 commercial applications and AS/400 systems, IBM Rochester (the home of the AS/400s) started developing the first of the high-end 64-bit PowerPC processors with AS/400 extensions, and IBM Endicott started developing a low-end single-chip PowerPC processor with AS/400 extensions.
IBM started the POWER2 processor effort as a successor to the POWER1 two years before the creation of the 1991 Apple/IBM/Motorola alliance in Austin, Texas. Despite being impacted by diversion of resources to jump start the Apple/IBM/Motorola effort, the POWER2 took five years from start to system shipment. By adding a second fixed-point unit, a second floating point unit, and other performance enhancements to the design, the POWER2 had leadership performance when it was announced in November 1993.
New instructions were also added to the instruction set:
To support the RS/6000 and RS/6000 SP2 product lines in 1996, IBM had its own design team implement a single-chip version of POWER2, the P2SC ("POWER2 Super Chip"), outside the Apple/IBM/Motorola alliance in IBM's most advanced and dense CMOS-6S process. P2SC combined all of the separate POWER2 instruction cache, fixed point, floating point, storage control, and data cache chips onto one huge die. At the time of its introduction, P2SC was the largest and highest transistor count processor in the industry. Despite the challenge of its size, complexity, and advanced CMOS process, the first tape-out version of the processor was able to be shipped, and it had leadership floating point performance at the time it was announced. P2SC was the processor used in the 1997 IBM Deep Blue chess playing supercomputer which beat chess grandmaster Garry Kasparov. With its twin sophisticated MAF floating point units and huge wide and low latency memory interfaces, P2SC was primarily targeted at engineering and scientific applications. P2SC was eventually succeeded by the POWER3, which included 64-bit, SMP capability, and a full transition to PowerPC in addition to P2SC's sophisticated twin MAF floating point units.
At some point in 1991, Apple Computer decided to not migrate their 68000-based software and hardware to Motorola's next generation 88xxx series microprocessor. Soon after, Apple, as one of Motorola's largest customers of desktop-class microprocessors, asked Motorola to join the discussions because of their long relationship, their more extensive experience with manufacturing high-volume microprocessors than IBM, and to serve as a second source for the microprocessors. This three-way collaboration, based in Austin, Texas, became known as the AIM alliance, for Apple, IBM, Motorola.
After two years of development, the resulting PowerPC architecture was introduced in 1993. A modified version of the RSC architecture, PowerPC added single-precision floating point instructions and general register-to-register multiply and divide instructions, and removed some POWER features such as the specialized multiply and divide instructions using the MQ register. It also added a 64-bit version of the architecture and support for SMP.
IBM introduced the POWER3 processor in 1998. It implemented the 64-bit PowerPC instruction set, including all of the optional instructions of the ISA (at the time). All subsequent POWER processors implemented the full 64-bit PowerPC and POWER instruction sets, so that there were no longer any IBM processors that implemented only POWER or only POWER2.
IBM introduced the POWER4 processor, the first in the GIGA-Series, in 2001. Like the POWER3, it was a full 64-bit processor, implementing the full 64-bit PowerPC instruction set; it also had the AS/400 extensions, and was used in both RS/6000 and AS/400 systems, replacing both POWER3 and the RS64 processors. There was a new ISA release at this point called the PowerPC 2.00 ISA, which added a couple of extensions to the ISA, such as a version of mfcr which also took a field argument.
IBM introduced the POWER5 processor in 2004. It is a dual-core processor with support for simultaneous multithreading with two threads, so it implements 4 logical processors. Using the Virtual Vector Architecture, several POWER5 processors can act together as one vector processor. The POWER5 added more instructions to the ISA.
The POWER5+ added even more instructions, bringing the ISA to version 2.02.
POWER6 was announced on May 21, 2007. It adds VMX to the POWER series. It also introduces the second generation of IBM ViVA, ViVA-2. It is a dual-core design, reaching 5.0 GHz at 65 nm. It has very advanced interchip communication technology. Its power consumption is nearly the same as the preceding POWER5, whilst offering doubled performance.
POWER7 was released in February 2010 and was a substantial evolution from the POWER6 design, focusing more on power efficiency through multiple cores and simultaneous multithreading.
While the POWER6 features a dual-core processor, each capable of two-way simultaneous multithreading (SMT), the IBM POWER7 processor has eight cores, and four threads per core, for a total capacity of 32 simultaneous threads. Its power consumption is similar to the preceding POWER6, while quadrupling the number of cores, with each core having higher performance.
Future successor to POWER7 currently under development with focus on improved SMT, reliability, larger caches, accelerators and more cores. It will be built on a 22 nm process at an unknown date.[2]
The POWER design is descended directly from the earlier 801 CPU, widely considered to be the first true RISC processor design. The 801 was used in a number of applications inside IBM hardware.
At about the same time the PC/RT was being released, IBM started the America Project, to design the most powerful CPU on the market. They were interested primarily in fixing two problems in the 801 design:
Floating point became a focus for the America Project, and IBM was able to use new algorithms developed in the early 1980s that could support 64-bit double-precision multiplies and divides in a single cycle. The FPU portion of the design was separate from the instruction decoder and integer parts, allowing the decoder to send instructions to both the FPU and ALU (integer) execution units at the same time. IBM complemented this with a complex instruction decoder which could be fetching one instruction, decoding another, and sending one to the ALU and FPU at the same time, resulting in one of the first superscalar CPU designs in use.
The system used 32 32-bit integer registers and another 32 64-bit floating point registers, each in their own unit. The branch unit also included a number of "private" registers for its own use, including the program counter.
Another interesting feature of the architecture is a virtual address system which maps all addresses into a 52-bit space. In this way applications can share memory in a "flat" 32-bit space, and all of the programs can have different blocks of 32 bits each.
|